KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning

نویسندگان

چکیده

Reasoning is a critical ability towards complete visual understanding. To develop machine with cognition-level understanding and reasoning abilities, the commonsense (VCR) task has been introduced. In VCR, given challenging question about an image, must answer correctly then provide rationale justifying its answer. The methods adopting powerful BERT model as backbone for learning joint representation of image content natural language have shown promising improvements on VCR. However, none existing utilized knowledge in reasoning, which we believe will be greatly helpful this task. Therefore, incorporate into cross-modal BERT, propose novel Knowledge Enhanced Visual-and-Linguistic (KVL-BERT short) model. Besides taking linguistic contents input, external integrated multi-layer Transformer. order to preserve structural information semantic original sentence, algorithm called RMGSR (Relative-position-embedding Mask-self-attention Guided Semantic Representations). Compared other task-specific models general task-agnostic pre-training models, our KVL-BERT outperforms them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Know2Look: Commonsense Knowledge for Visual Search

With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background common...

متن کامل

Cis Schut Bert Bredeweg

Constructing a qualitative model of some device usually proceeds as a cycle of model formulation and model debugging . The latter is driven by discrepancies between the behaviour predicted by the model and the actual device behaviour. This paper describes how the elimination of one type of discrepancy, incorrectly predicted derivatives, can be supported . It provides an analysis of the knowledg...

متن کامل

My friend Bert.

It had been a busy morning in the hospital with surgeries, rounds, and outpatient department visits. I enjoyed the few minutes of solitude as I drove to the nursing home to see my friend Bert. He sat, slouched over, in a geriatric chair and ever so slowly sat up when I spoke to him. His expressionless face could no longer welcome me, yet his trembling hands reached out to grasp mine. I sat clos...

متن کامل

Bert Bongers An Interview with

in Berlin, Edwin had the idea to form a trio. Sensorband’s first performance was in December of 1993, at Voyages Virtuels, a virtual reality exhibit organized by Les Virtualistes in Paris. I interviewed Sensorband in The Hague in November 1996, and thereafter the discussion was extended through electronic mail. The topics of the interview included: how they established an ensemble based entirel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge Based Systems

سال: 2021

ISSN: ['1872-7409', '0950-7051']

DOI: https://doi.org/10.1016/j.knosys.2021.107408